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ABSTRACT 

We  consider  high-dimensional  estimation  of  a  (possibly 
sparse)  Kronecker-decomposable  covariance  matrix  given 
i.i.d.  Gaussian  samples.  We  propose  a  sparse  covariance  esti¬ 
mation  algorithm,  Kronecker  Graphical  Lasso  (KGlasso),  for 
the  high  dimensional  setting  that  takes  advantage  of  structure 
and  sparsity.  Convergence  and  limit  point  characterization 
of  this  iterative  algorithm  is  established.  Compared  to  stan¬ 
dard  Glasso,  KGlasso  has  low  computational  complexity  as 
the  dimension  of  the  covariance  matrix  increases.  We  de¬ 
rive  a  tight  MSE  convergence  rate  for  KGlasso  and  show  it 
strictly  outperforms  standard  Glasso  and  FF.  Simulations  val¬ 
idate  these  results  and  shows  that  KGlasso  outperforms  the 
maximum-likelihood  solution  (FF),  in  the  high-dimensional 
small-sample  regime. 

Index  Terms —  sparsity,  structured  covariance  estima¬ 
tion,  penalized  maximum  likelihood,  graphical  lasso 

1.  INTRODUCTION 

Covariance  estimation  is  a  problem  of  great  interest  in  many 
different  disciplines,  including  machine  learning,  signal  pro¬ 
cessing,  economics  and  bioinformatics.  In  this  paper  we 
consider  covariance  estimation  in  the  multivariate  Gaussian 
model  under  the  separable  positive  definite  pf  x  pf  covari¬ 
ance  matrix  assumption: 

So  =  Ao  g)  Bq  (1) 

where  Ao  is  a  p  x  p  positive  definite  matrix  and  Bo  is  an 
/  x  /  positive  definite  matrix.  Model  (1)  arises  in  channel 
modeling  for  MIMO  wireless  communications,  where  Ao  is 
a  transmit  covariance  matrix  and  Bo  is  a  receive  covariance 
matrix,  and  in  other  applications,  see  [1].  Let  ©o  :=  E^1  de¬ 
note  the  inverse  covariance,  or  precision  matrix.  As  compared 
to  the  standard  saturated  (unstructured)  model,  the  number  of 
independent  parameters  in  (1)  is  reduced  from  0(p2/2)  to 
0(p2)  +  0(/2).  Furthermore,  as  shown  [1],  factorization  (1) 
results  in  a  significant  reduction  in  estimation  mean  squared 
error  and  in  estimator  computational  complexity.  In  this  paper 
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we  propose  estimation  of  a  sparse  version  of  the  Kronecker 
product  model  (1)  resulting  in  even  more  significant  perfor¬ 
mance  improvements  than  for  the  saturated  model  studied  in 
[1]. 

Under  model  (1),  the  joint  probability  distribution  of  the 
measurements  can  be  represented  by  an  undirected  graph  Q  = 
(V,£),  where  V  is  the  vertex  set  (each  vertex  corresponding 
to  a  variable)  and  £  is  the  edge  set.  If,  given  all  the  other  vari¬ 
ables,  the  ?'th  variable  is  conditionally  independent  of  the  yth 
variable,  then  (i,j)  £  [2].  Estimating  an  undirected  Gaus¬ 

sian  graphical  model  is  equivalent  to  estimating  the  inverse 
covariance  matrix.  Penalized  likelihood  estimators  for  Gaus¬ 
sian  graphical  models,  such  as  the  graphical  lasso  (Glasso) 
have  been  proposed  [3,  4,  5].  The  maximum-likelihood  (ML) 
estimator  of  the  Kronecker  product  (1)  has  been  studied  in 
[1,  6].  While  the  ML  estimator  has  no  known  closed-form 
solution,  an  approximation  to  the  solution  can  be  iteratively 
computed  via  an  alternating  algorithm:  the  flip-flop  (FF)  al¬ 
gorithm  [1,  6], 

To  our  knowledge,  ML  estimation  for  the  situation  where 
the  Kronecker  component  matrices  are  themselves  sparse  has 
not  been  studied.  In  addition  to  the  Kronecker  factorization, 
we  exploit  sparsity  in  order  to  derive  better  estimators,  espe¬ 
cially  for  the  large-dimension  small-sample  regime.  In  this 
paper,  we  propose  an  l\  -penalized  likelihood  estimator  for 
the  sparse  Kronecker  product  case. 

Statistical  consistency  is  guaranteed,  i.e.,  the  estimator 
converges  in  probability  to  the  true  inverse  covariance  matrix 
©o  asymptotically  as  the  number  of  samples  and  dimensions 
of  Kronecker  factor  matrices  grows  to  infinity.  The  main  con¬ 
tribution  is  the  derivation  of  the  high-dimensional  MSE  con¬ 
vergence  rates  for  KGlasso.  When  both  Kronecker  factors  are 
sparse,  it  is  shown  that  KGlasso  strictly  outperforms  FF  and 
naive  Glasso  in  MSE,  and  the  performance  improvement  can 
be  very  significant.  Simulations  show  that  KGlasso  exhibits 
superior  empirical  performance. 

2.  NOTATION 

For  a  square  matrix  M,  define  |M|i  =  ||vec(M)||1  and 
IMloo  =  ||vec(M)||  ,  where  vec(M)  denotes  the  vectorized 

form  of  M  (concatenation  of  columns  into  a  vector).  ||M||2 
is  the  spectral  norm  of  M.  M.(jj  and  [M]jj  are  the  (*,  j)th  el- 
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ement  of  M.  Let  the  inverse  transformation  (from  a  vector  to 
a  matrix)  be  defined  as:  vec_1(x)  =  X,  where  x  =  vec(X). 
Define  the  pf  x  pf  permutation  operator  K pj  such  that 
Kp  jvec(N)  =  vec(NT)  for  any  px  /  matrix  N.  For  a  sym¬ 
metric  matrix  M,  A(M)  will  denote  the  vector  of  real  eigen¬ 
values  of  M  and  define  Amoa;(M)  =  ||M|j2  =  maxAj(M) 
forp.d.  symmetric  matrix,  and  Amj„(M)  =  min  A,(M). 

For  a  matrix  M  of  size  pf  x  pf,  let  -=1 

denote  its  /  x  /  block  submatrices,  where  each  block  sub¬ 
matrix  is  M  (i,j)  =  [M]  ,  I ./ ,  i  Also  let 

{M(fc,  /)}{ /=1  denote  the  p  x  p  block  submatrices  of  the 
permuted  matrix  M  =  K.p  jMKpj. 

Define  the  set  of  symmetric  matrices  Sp  and  the  set  of 
symmetric  positive  definite  (pd)  matrices  £5. ,  . 

3.  GRAPHICAL  LASSO  FRAMEWORK 

Available  are  n  i.i.d.  multivariate  Gaussian  observations 
{z t}"=1,  where  zt  G  W*  has  zero-mean  and  covariance 
equal  to  S  =  Aq  0  Bq.  The  log-likelihood  function  is 
proportional  to: 

((E)  :=  logdet(S-1)  -  tr(E_1Sn),  (2) 

where  E  is  the  positive  definite  covariance  matrix  and  Sn  = 
-  Y^t=i  ztzt"  is  the  sample  covariance  matrix.  Recent  work 
[7,  8]  has  considered  U -penalized  maximum  likelihood  esti¬ 
mators  for  the  saturated  model  where  S  belongs  to  the  unre¬ 
stricted  cone  of  positive  definite  matrices.  These  estimators 
are  known  as  graphical  lasso  (Glasso)  estimators  and  are  the 
solution  to  the  t-\  -penalized  minimization  problem: 

E„  G  arg  min  {-/(E)  +  A | E _  1 1 1 } ,  (3) 

SeS++ 

where  A  >  0  is  a  regularization  parameter.  If  A  >  0  and  S„ 
is  positive  definite,  then  E„  in  (3)  is  the  unique  minimizer. 

A  fast  iterative  algorithm,  based  on  a  block  coordinate 
descent  approach,  exhibiting  a  computational  complexity 
0((pf)3),  was  developed  in  [8]  to  solve  the  convex  program 

(3).  Under  the  assumption  A  x  ^/loeh'f)  sojutjon  0f  (3)  was 
shown  to  have  high  dimensional  convergence  rate  [3]: 


l|G(S„,  A)  -  Boll,  =  Op  (JW±S^MT\  (4) 

where  s  is  an  upper  bound  on  the  number  of  non-zero  off- 
diagonal  elements  of  ©q.  When  s  =  0(pf),  this  rate  is  better 
than  the  non-regularized  sample  covariance  estimator: 


||Sn  ~  E0  ||  F  =  Op 


(5) 


4.  KRONECKER  GRAPHICAL  LASSO 

Let  So  :=  Ao  0)  Bo  denote  the  true  covariance  matrix,  where 
Aq  :=  X{  1  and  Bo  =  Y,{ 1  are  the  true  Kronecker  factors. 
Let  A  inn  denote  the  initial  guess  of  Aq  =  Xjj-1. 

Define  J(X,  Y)  as  the  negative  log-likelihood 

J(X,  Y)  =  tr((X  0  Y)S„)  -  / log det(X) 

—  plogdet(Y)  (6) 

Although  the  objective  (6)  is  not  jointly  convex  in  (X,  Y),  it 
is  biconvex.  This  motivates  the  flip-flop  algorithm  [1].  Adapt¬ 
ing  the  notation  from  [1],  define  the  mappings  A(-),  B(-): 

1  f  — 

A(B)  =  -Y/[V~1}k,iSn(l,k),  (7) 

" ; 1  k’i=i 

1  p 

v(A}=-  E  (8) 

^-^'7'  P  i,j= 1 
fxf 

where  Sn  =  K^fSnKpJ.  For  fixed  B  G  Sf++,  A(B)  in  (7) 
is  the  minimizer  of  J(A-1,  B_1)  over  A  G  S++.  A  similar 
interpretation  holds  for  (8).  The  flip-flop  algorithm  [1]  starts 
with  some  arbitrary  p.d.  matrix  Airnt  and  computes  B  using 
(8),  then  A  using  (7),  and  repeats  until  convergence.  The  FF 
algorithm  does  not  account  for  sparsity. 

If  ©0  =  Xu  0  Y 0  is  a  sparse  matrix,  which  implies  that  at 
least  one  of  Xo  or  Yo  is  sparse,  one  can  penalize  the  outputs 
of  the  flip-flop  algorithm  and  iteratively  minimize 

Jx(X,Y)  =  J(X,Y)  +  Xx\X\1  +  Xy\Y\1.  (9) 

This  leads  to  an  algorithm  that  we  call  KGlasso  (see  Algo¬ 
rithm  1),  which  sparsifies  the  Kronecker  factors  in  proportion 
to  the  parameters  Xx,  A y  >  0.  This  additive  penalty  was  first 
proposed  in  [9]  in  the  context  of  missing  data. 


Algorithm  1  Kronecker  Graphical  Lasso  (KGlasso) 

l:  Input:  S„,  p,  /,  n,  A.y  >  0,  Ay  >  0 
2:  Output:  &KGI  asso 

3:  Initialize  Ainlf  to  be  positive  definite  satisfying  Assump¬ 
tion  1. 

•|:  *  '  Amit 

5:  repeat 

6:  B  <-  i  £L= 1  ;x  i)  (see  Eq.  (7)) 

7:  Y  G-  G(B,  ^),  where  G(-,  •)  is  defined  in  (10) 

8:  A  <r-  j  J2l,l= 1  [Y]fc,;S„(/,  k)  (see  Eq.  (8)) 

9:  X<-G(A,^) 

10:  until  convergence 

11:  &  KGlasso  X  0  Y 


The  Glasso  mapping  (3)  is  written  as  G(-,  A)  :  Sd  — »  Sd, 

G(T,A)  =  arg  min  (tr(©T)  —  logdet(0)  +  A|©|i|. 

©eS^+  I-  J 

GO) 

As  compared  to  the  0{p4f4)  computational  complexity  of 
Glasso  [8],  KGlasso  has  a  computational  complexity  of  only 

o(P4  +  f4). 

Assuming  Sn  is  p.d.,  KGlasso  converges  to  a  critical  point 
of  the  objective  function  [10].  Under  a  mild  assumption  on 
the  starting  point,  KGlasso  can  be  shown  to  converge  to  a 
local  minimum  [10]. 


the  computational  complexity  for  FF  is  0(p 2  +  f2)  which  is 
less  than  the  0(p2f2)  complexity  of  SCM,  by  exploiting  Kro- 
necker  structure  FF  simultaneously  achieves  improved  MSE 
performance  and  reduced  computational  complexity. 

6.  HIGH  DIMENSIONAL  CONSISTENCY  OF 
KGLASSO 

In  this  section,  high  dimensional  consistency  is  established 
for  KGlasso  as  n,p,f—>  oo. 

Define  © KGlasso{k)  as  the  output  of  the  fcth  KGlasso  it¬ 
eration. 


5.  HIGH  DIMENSIONAL  CONSISTENCY  OF  FF 


In  this  section,  we  show  that  the  flip-flop  (FF)  algorithm 
achieves  the  optimal  (non-sparse)  statistical  convergence  rate 

of  Op  (^/p2+f^J  (up  to  a  log-factor).  This  result  (Thm. 

1)  allows  us  to  establish  that  the  proposed  KGlasso  has  sig¬ 
nificantly  improved  MSE  convergence  rate  (Thm  2).  We 
make  the  following  standard  assumption  on  the  spectra  of  the 
Kronecker  factors. 


Assumption  1.  Uniformly  Bounded  Spectra 

There  exist  absolute  constants  kA,kA,kB>  k Bih.Ai  it >  k Ainit 

such  that: 


la.  0  <  k_A  —  ^mm(Ao)  -^max(Ao)  L  k A  <’'  tX) 

lb.  0  <  kp  A  Am,n(B0)  ^  Amax(Bo)  kp  ^  oo 

2.  0  k_A  ina  —  ^min  (A-init')  —  ^max^^-init 

kj 4init  <  OO 


< 


Let  Rff(3)  :=  A(B(Ainit))®B(A(B(Ainit)))  denote 
the  3-step  (noniterative)  version  of  the  flip-flop  algorithm  [1], 
More  generally,  let  TIff  (k)  denote  the  fc-step  version  of  the 
flip-flop  algorithm.  Let  &pp(k)  =  (R ff(A;))-1. 


Theorem  1.  Let  Ao,  Bo,  and  Ajri,;t  satisfy  Assumption  1  and 
define  M  =  max(p,  /,  n).  Assume  p  >  f  >  2  and  p  log  M  < 
C"  n  for  some  finite  constant  C"  >  0.  Finally,  assume  n  > 
j  +  1.  Then,  for  k  >  2  finite, 


||©FF(fc)  ~  Q0||f  =  Op  ^ J l0gM ^  (11) 

as  n  — >  oo. 

Proof.  Due  to  space  limitations  the  proof  is  given  in  [  10] .  □ 

The  bound  (11)  specifies  the  rate  of  reduction  of  the  es¬ 
timation  error  for  the  multi-iteration  FF  algorithm,  which  in¬ 
cludes  the  three  step  FF  algorithm  ( k  =  3)  [1]  as  a  special 
case.  The  error  reduction  decreases  as  long  as  p  and  /  do  not 
increase  too  quickly  in  n. 

Note  that  (11)  specifies  a  faster  rate  than  that  of  the  naive 
sample  covariance  matrix  estimator  (5).  Furthemore,  since 


Theorem  2.  Let  Ao,Bo,A satisfy  Assumption  1.  Let 
M  =  max(p,  /,  n).  Let  Ay  ^  x  P'J^np1  anc^  x 

p ,  /,  n  -y  oo  for  all  k  >  1  and  k'  >  2.  Assume  sparse 
Xq  and  Yo,  i.e.  Sx0  =  O(p),sy0  =  O(f).  Assume 
max  (j;  p'j  log M  =  o(n).  Then,  for  k  >  2  finite,  we 
have 

|| &KGlasso(k)  -  ©0 1| F  =  Op  (J2) 

as  n  —y  oo. 

Proof.  The  proof  uses  results  from  large  deviation  theory.  See 

[10].  □ 

Thm.  2  generalizes  Thm.  1  to  the  case  of  sparse  Kro¬ 
necker  structure.  Comparison  between  the  error  expressions 
(4),  (11)  and  (12)  show  that,  by  exploiting  both  Kronecker 
structure  and  sparsity,  KGlasso  can  attain  significantly  lower 
estimation  error  than  standard  Glasso  [3]  and  FF  [1], 

7.  SIMULATION  RESULTS 

In  this  section,  we  compare  the  KGlasso  algorithm  with  the 
flip-flop  (FF)  algorithm  [1]  that  iteratively  computes  the  ML 
solution.  The  Glasso  algorithm  implementation  used  was 
based  on  [8]  with  a  stopping  criterion  determined  by  when 
the  duality  gap  falls  below  a  threshold  of  10-5. 

To  empirically  evaluate  performance,  Monte  Carlo  simu¬ 
lations  were  used.  The  true  matrices  Xo  :=  A0  1  and  Yo  := 
Bq-1  were  unstructured  randomly  generated  positive  definite 
matrices  based  on  an  Erdos-Renyi  graph  model.  Performance 
assessment  was  based  on  normalized  Frobenius  norm  error  in 
the  covariance  and  precision  matrix  estimates.  The  normal¬ 
ized  error  was  calculated  using 

where  Nmc  is  the  number  of  Monte  Carlo  runs  and  £(i)  is 
the  covariance  output  from  the  ith  trial  run.  1 

1  The  same  formula  can  be  adapted  to  calculate  the  normalized  error  in  the 
precision  matrix  ©o . 


Nmc 


Nmc 


2-*ii= t 


l|So-S(t) 

IISollJ, 


In  our  implementation  of  KGlasso,  the  regularization 
parameters  were  chosen  as  follows.  The  initialization  was 
=  Ip.  The  regularization  parameters  were  selected  as 


\(1)  _  „ 
ay  —  Cy 


log  M  \(2)  _ 
np  ’  AX  —  C a 


log  M  ^(1)  ^(2)  _  ^(2) 


nf 


'X 


A ^  =  A^\  etc.  We  set  cx  =  cy  =  0.4. 

We  consider  the  setting  where  X0  and  Y0  are  large  sparse 
matrices  of  dimension  p  =  f  =  60  (see  Fig.  1).  The  dimen¬ 
sion  of  @o  is  d  =  3,  600,  which  is  too  large  for  standard 
Glasso  to  handle.  Thus,  it  is  not  shown. 

Figure  2  compares  the  root-mean  squared  error  (RMSE) 
performance  in  precision  and  covariance  matrices  as  a  func¬ 
tion  of  n.  As  expected,  KGlasso  outperforms  FF  [1]  over  the 
exhibited  range  of  n  for  both  the  covariance  and  the  inverse 
covariance  estimation  problem. 


Fig.  1.  Sparse  Kronecker  matrix  representation.  Left  panel: 
left  Kronecker  factor.  Right  panel:  right  Kronecker  factor. 


Fig.  2.  Normalized  RMSE  performance  for  precision  matrix 
(top)  and  covariance  matrix  (bottom)  as  a  function  of  n.  For 
n  =  10,  there  is  a  69%  RMSE  reduction  for  the  precision 
matrix  and  36%  RMSE  reduction  for  the  covariance  matrix 
when  using  KGlasso  instead  of  FF. 


8.  CONCLUSION 

We  considered  high-dimensional  estimation  of  a  Kronecker- 
decomposable  covariance  matrix  given  i.i.d.  Gaussian  sam¬ 


ples.  A  (| -penalized  likelihood  approach  was  proposed  for 
estimating  the  covariance  matrix  when  the  kronecker  factors 
are  sparse.  This  led  to  an  iterative  algorithm  (KGlasso)  that 
takes  advantage  of  structure  and  sparsity.  A  tight  MSE  con¬ 
vergence  rate  was  derived  for  KGlasso,  showing  significantly 
better  MSE  performance  than  standard  Glasso  and  FF  [1]. 
Simulations  validated  our  theoretical  predictions. 
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